Biological interaction: Genetic factor(s) and environmental factor(s) participate in the same causal mechanism (Rothman et al., 2008)
Statistical interaction using linear regression (unrelated individuals):
\(y = \mu + \beta_g x_g + \beta_e x_e + \beta_{int} x_g x_e + e\)
(Aschard et al., 2012, HumGen)
| Consortium | Sample size | Exposure | Outcome | Reference |
|---|---|---|---|---|
| CHARGES + SPIROMETA | 50,047 | Smoking | Pulmonary function | (Hancock et al., 2012) |
| SUNLIGHT | 35,000 | Vitamin D intake | Circulating Vitamin D level | (Wang et al., 2010) |
| GIANT | up to 339,224 | Gender | Anthropometric traits | (Heid et al., 2010) |
| … |
Study: G x smoking in pulmonary function outcomes (Hancock et al., 2012)
Findings: three novel gene regions
Figure: G x ever-smoking in FEV1/FVC (Hancock et al., 2012)
Abbreviations: FEV1, Force Expiratory Volume in 1 second; FVC, Force Vital Capacity
Study: G x gender in the Genetic Investigation of Anthropometric Traits (GIANT) consortium
Findings:
Abbreviations: WHR, Waist-hip ratio
Stratified framework described in (Magi et al., 2010), (Randall et al., 2013)
Statistical power for interaction tests is lower than for similar tests of marginal genetic effects (Murcray et al., 2011)
It also faces other potential issues (Aschard et al., 2012):
Relatedness is yet another layer of complexity in GxE analysis,
which impact on the full/stratified GxE frameworks is seldom explored.
Assess the relative performance of GxE methods
in the presence of structure
Methods to account for relatedness are relatively well established
in marginal association studies (GWAS)
\(y = X \beta + g + f + e\)
\(\mbox{where } g \perp f \perp e\)
\(\mbox{implying}\)
\(y \sim (X \beta, \sigma_g^2 K + \sigma_f^2 F + \sigma_r^2 I) = (X \beta, V)\)
\(X \beta = \mu + \beta_g x_g\)
\(X \beta = \mu + \beta_g x_g +\)
\(\mbox{ } \mbox{ } \mbox{ } \mbox{ } \mbox{ } \beta_e x_e + \beta_{int} x_g x_e\)
(Lynch and Walsh, 1998)
Genetic relationship matrix \(K\)
Shared environment matrix \(F\)
Residual variance \(I\)
\(\hat{V} = \hat{\sigma_g^2} K + \hat{\sigma_f^2} F + \hat{\sigma_r^2} I\)
\(\hat{\beta} = (X^T \hat{V}^{-1} X)^{-1} X^T \hat{V}^{-1} Y\)
\(var(\hat{\beta}) = (X^T \hat{V}^{-1} X)^{-1}\)
For the power derivation: \(var(\hat{\beta_x})\) needs to be extracted
\(Z^2_x = \hat{\beta_x}^2 / var(\hat{\beta_x}) \simeq \chi^2_1\)
Simplify to a one-covariate model by orthogonalization
\(y^*\), centered \(y\)
\(x^*_g\), centered \(x_g\)
\(var(\hat{\beta}_g) = ({x^*_g}^T \hat{V}^{-1} x^*_g)^{-1}\)
\(y^*\), centered \(y\)
\(x^*_{ge}\), centered \((x_g - \mu_g) (x_e - \mu_e)\)
\(var(\hat{\beta}_{int}) = ({x^*_{ge}}^T \hat{V}^{-1} x^*_{ge})^{-1}\)
The power as a function of the non-centrality parameter (NCP)
\(NCP = \beta^2 (x^T \hat{V}^{-1} x)^{-1} \approx \beta^2 tr(\hat{V}^{-1} \Sigma_x)\)
| Data | Distribution |
|---|---|
| outcome | \(y \sim (X \beta, V) = (X \beta, \sigma_f^2 F + \sigma_g^2 K + \sigma_r^2 I)\) |
| predictor | \(x \sim (\mu_x, \Sigma_x)\) |
Approximation using the quadratic forms
If \(x\) is a vector of random variables, the quadratic form \(x^TAx\) is a scalar random variable.
If \(x\) has mean \(\mu\) and (nonsingular) covariance matrix \(V\), then
\(E(x^TAx) = tr(AV) + \mu^T A \mu\)
\(\sigma^2(x^TAx) = 2tr(AVAV) + 4\mu AVA \mu\)
(Lynch and Walsh, 1998)| structure | \(\Sigma_y\) = \(V\) | \(\Sigma_{x_g}\) | \(NCP \approx \beta^2 tr(\hat{V}^{-1} \Sigma_{x_g})\) |
|---|---|---|---|
| unrelated | \((\sigma_g^2 + \sigma_r^2) I\) | \(\sigma^2_{x_g} I\) | \(\beta^2 \mbox{ } 2pq \mbox{ } N\) |
| genetically related | \(\sigma_g^2 K + \sigma_r^2 I\) | \(\sigma^2_{x_g} K\) | \(\beta^2 2pq \mbox{ } tr((\hat{\sigma}_g^2 K + \hat{\sigma}_r^2 I)^{-1} K)\) |
| shared environment | \(\sigma_f^2 F + \sigma_r^2 I\) | \(\sigma^2_{x_g} I\) | \(\beta^2 2pq \mbox{ } tr((\hat{\sigma}_f^2 F + \hat{\sigma}_r^2 I)^{-1})\) |
Under assumption that \(x_g\) and \(x_e\) are independent:
\(\Sigma_{x_{ge}} = \sigma^2_{x_g} K \sigma^2_{x_e} I = \sigma^2_{x_g} \sigma^2_{x_e} I\)
| structure | \(\Sigma_y\) = \(V\) | \(\Sigma_{x_{ge}}\) | \(NCP \approx \beta^2 tr(\hat{V}^{-1} \Sigma_{x_{ge}})\) |
|---|---|---|---|
| unrelated | \((\sigma_g^2 + \sigma_r^2) I\) | \(\sigma^2_{x_g} \sigma^2_{x_e} I\) | \(\beta^2 \mbox{ } 2pq \mbox{ } f (1 -f) \mbox{ } N\) |
| genetically related | \(\sigma_g^2 K + \sigma_r^2 I\) | \(\sigma^2_{x_g} \sigma^2_{x_e} I\) | \(\beta^2 2pq \mbox{ } f (1 -f) \mbox{ } tr((\hat{\sigma}_g^2 K + \hat{\sigma}_r^2 I)^{-1})\) |
Data simulation of the whole sample (nuclear families):
For genetically related: \(NCP = \beta^2 2pq \mbox{ } tr((\hat{\sigma}_g^2 K + \hat{\sigma}_r^2 I)^{-1} K)\)
But our formula allows us to explore further performances
For genetically related: \(NCP = \beta^2 2pq \mbox{ } f (1 -f) \mbox{ } tr((\hat{\sigma}_g^2 K + \hat{\sigma}_r^2 I)^{-1})\)
Data simulation of the whole sample (genetically unrelated, but related by shared env.):
For shared environment : \(NCP = \beta^2 2pq \mbox{ } tr((\hat{\sigma}_f^2 F + \hat{\sigma}_r^2 I)^{-1})\)
Marginal analysis
GxE interaction analysis
Ongoing work on more realistic scenarios
| Stratified interaction test | Reference | |
|---|---|---|
| Independent stratas | \(Z_{int} = \frac{\beta_m - \beta_f}{\sqrt{\sigma_{\beta_m}^2 + \sigma_{\beta_f}^2}} \sim \mathcal{N}(0, 1)\) | (Magi et al., 2010) |
| Related stratas | \(Z_{int} = \frac{\beta_m - \beta_f}{\sqrt{\sigma_{\beta_m}^2 + \sigma_{\beta_f}^2 + r \sigma_{\beta_m} \sigma_{\beta_f}}} \sim \mathcal{N}(0, 1)\) | (Randall et al., 2013) |
\(r\) is the spearman correlation between the two tests
Data simulation of the whole sample (nuclear families + shared environment):
Output: \(\rho = 0.167\) between stratas
Bear in mind the results from the LD score regression for two outcomes (Bulik-Sullivan et al., 2015)
\(E[Z_{1j} Z_{2j}] = \frac{\sqrt{N_1 N_2} {\rho}_g}{M} l_{j} + \frac{N_s \rho}{\sqrt{N_1 N_2}}\)
Abbreviations: COPD, Chronic obstructive pulmonary disease
Previous studies reported
The project aims at leveraging the ancestry information in GxE tests
\(y = \mu + x_g \beta_g + x_e \beta_e + x_g \times x_e \beta_{int} + a_g \beta_1 + a_l \beta_2 + a_g \times x_e \beta_3 + a_l \times x_e \beta_4\) (\(\beta_{int} = 0\))
\(y = \mu + a_g \beta_1 + a_l \beta_2 + a_g \times x_e \beta_3 + a_l \times x_e \beta_4\) (\(\beta_4 = 0\))
Marginal scan replicates the locus in Chr 12, Gene FAM19A2 (Parker et al. 2014)
G x current-smoker scan suggests the locus in Chr 11, Gene PARVA, that (Wan, et al. 2015) – smoking-associated site-specific differential methylation in buccal mucosa in COPDgene
Simulation on the null outcome: \(y_{null} = \mu + \beta a_{l} + e\)
| Data | Relationship Matrix | Method |
|---|---|---|
| Genotypes (2M) | GRM | LMM |
| Local ancestry (40K) | ARM | LMM |
| Study design 1 | Study design 2 | Study design 3 | |
|---|---|---|---|
| Sample | Family-based | Population-based | Population-based |
| Relationships | Kinship | GRM | |
| Method | Linear mixed models | Linear models | Linear mixed models |
GxE in study design 3 is our ongoing work (not presented today)
GxE in study designs 1 vs. 2 (today focus)
Given: a population of 50,000 related samples (nuclear families)
Experiment: pool 5,000 unrelated samples or pool randomly
| relatedness | \(V\) | \(\Sigma_x\) | Normalization |
|---|---|---|---|
| unrelated | \(\sigma_g^2 K + \sigma_r^2 I = (\sigma_g^2 + \sigma_r^2) I\) | \(\sigma_x I\) | \(\sigma_g^2 + \sigma_r^2 = 1\) |
| genetically related | \(\sigma_g^2 K + \sigma_r^2 I\) | \(\sigma_x K\) | \(\sigma_g^2 + \sigma_r^2 = 1\) |
The Genetic Analysis of Idiopathic Thrombophilia 2 (GAIT2) Project
Developed tools for analysis of family-based samples